Learning to Plan via a Multi-step Policy Regression Method
نویسندگان
چکیده
We propose a new approach to increase inference performance in environments that require specific sequence of actions order be solved. This is for example the case maze where ideally an optimal path determined. Instead learning policy single step, we want learn can predict n advance. Our proposed method called horizon regression (PHR) uses knowledge environment sampled by A2C dimensional vector distillation setup which yields sequential per observation. test our on MiniGrid and Pong show drastic speedup during time successfully predicting sequences
منابع مشابه
An EM-based Multi-Step Piecewise Surface Regression Learning Algorithm
A multi-step Expectation Maximization based (EM-based) algorithm is proposed to solve piecewise surface regression problem which has typical applications in market segmentation research, identification of consumer behavior patterns, weather patterns in meteorological research, and so on. The multiple steps involved are local regression on each data point of the training data set and a small set...
متن کاملMulti-Step Regression Learning for Compositional Distributional Semantics
We present a model for compositional distributional semantics related to the framework of Coecke et al. (2010), and emulating formal semantics by representing functions as tensors and arguments as vectors. We introduce a new learning method for tensors, generalising the approach of Baroni and Zamparelli (2010). We evaluate it on two benchmark data sets, and find it to outperform existing leadin...
متن کاملMulti-step Off-policy Learning Without Importance Sampling Ratios
To estimate the value functions of policies from exploratory data, most model-free offpolicy algorithms rely on importance sampling, where the use of importance sampling ratios often leads to estimates with severe variance. It is thus desirable to learn off-policy without using the ratios. However, such an algorithm does not exist for multi-step learning with function approximation. In this pap...
متن کاملMulti-Step Polynomial Regression Method to Model and Forecast Malaria Incidence
Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available d...
متن کاملLearning to Cooperate via Policy Search
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-86380-7_39